Readable and Coherent MultiDocument Summarization
نویسندگان
چکیده
Extractive summarization is the process of precisely choosing a set of sentences from a corpus which can actually be a representative of the original corpus in a limited space. In addition to exhibiting a good content coverage, the final summary should be readable as well as structurally and topically coherent. In this paper we present a holistic, multi-document summarization approach which takes care of the content coverage, sentence ordering, maintenance of topical coherence, topical order and inter-sentence structural relationships. To achieve this we have introduced a novel concept of a Local Coherent Unit(LCU). Our results are comparable with the peer systems for content coverage and sentence ordering measured in terms of ROUGE and τ score respectively. The human evaluation preference for readability and coherence of summary are significantly better for our approach vis a vis other approaches. The approach is scalable to bigger realtime corpus as well.
منابع مشابه
Abstractive Multi-document Summarization by Partial Tree Extraction, Recombination and Linearization
Existing work for abstractive multidocument summarization utilise existing phrase structures directly extracted from input documents to generate summary sentences. These methods can suffer from lack of consistence and coherence in merging phrases. We introduce a novel approach for abstractive multidocument summarization through partial dependency tree extraction, recombination and linearization...
متن کاملInferring Strategies for Sentence Ordering in Multidocument News Summarization
The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input art...
متن کاملAn Automatic Multidocument Text Summarization Approach Based on Naïve Bayesian Classifier Using Timestamp Strategy
Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor coherence among the sentences, and redundancy. This paper introduces a new concept of timestamp approach with Naïve Bayesian Classification approach for multido...
متن کاملA Proposed Textual Graph Based Model for Arabic Multi-document Summarization
Text summarization task is still an active area of research in natural language preprocessing. Several methods that have been proposed in the literature to solve this task have presented mixed success. However, such methods developed in a multi-document Arabic text summarization are based on extractive summary and none of them is oriented to abstractive summary. This is due to the challenges of...
متن کاملAn Integrated Multi-document Summarization Approach based on Word Hierarchical Representation
This paper introduces a novel hierarchical summarization approach for automatic multidocument summarization. By creating a hierarchical representation of the words in the input document set, the proposed approach is able to incorporate various objectives of multidocument summarization through an integrated framework. The evaluation is conducted on the DUC 2007 data set.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Research in Computing Science
دوره 90 شماره
صفحات -
تاریخ انتشار 2015